In this homework, I worked with a dataset “IE360_Spring22_HW2_data.csv”. The dataset contains . This file contains quarterly gasoline and diesel sales (in 1000 m3) of a major distributor between 2000 and 2006.
My main objective is to forecast the sales of UGS for every quarter of 2007 by using time series regression methods with the use of other independent variables in the dataset given.
I divided the solution into six parts according to the questions given.
# Necessary libraries to be used
library(data.table)
library(lubridate)
library(forecast)
library(zoo)
library(fpp)
library(ggplot2)
library(xts)
library(TSstudio) # if necessary, please install the package
library(RcppRoll)
library(tsbox)
Warning message:
"package 'data.table' was built under R version 3.6.3"Warning message:
"package 'lubridate' was built under R version 3.6.3"
Attaching package: 'lubridate'
The following objects are masked from 'package:data.table':
hour, isoweek, mday, minute, month, quarter, second, wday, week,
yday, year
The following objects are masked from 'package:base':
date, intersect, setdiff, union
Warning message:
"package 'forecast' was built under R version 3.6.3"Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Warning message:
"package 'zoo' was built under R version 3.6.3"
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Warning message:
"package 'fpp' was built under R version 3.6.3"Loading required package: fma
Warning message:
"package 'fma' was built under R version 3.6.3"Loading required package: expsmooth
Warning message:
"package 'expsmooth' was built under R version 3.6.3"Loading required package: lmtest
Warning message:
"package 'lmtest' was built under R version 3.6.3"Loading required package: tseries
Warning message:
"package 'tseries' was built under R version 3.6.3"Warning message:
"package 'xts' was built under R version 3.6.3"
Attaching package: 'xts'
The following objects are masked from 'package:data.table':
first, last
Warning message:
"package 'TSstudio' was built under R version 3.6.3"Warning message:
"package 'RcppRoll' was built under R version 3.6.3"
Attaching package: 'tsbox'
The following object is masked from 'package:TSstudio':
ts_plot
data = read.csv("IE360_Spring22_HW2_data.csv", stringsAsFactors = F) # reading the dataset
setDT(data) # to be able to use the features of datatable, make the format of the data datatable
data$Quarter = gsub("_Q", "-Q", data$Quarter) # Converting quarter names into more desired format by deleting _Q and
#adding -Q
colnames(data) <- c('Quarter','UGS','RNUV', 'NLPG', 'PU', 'PG', 'NUGV', 'NDGV', 'GNP_a', 'GNP_c', 'GNP_t')
# Changing column names into easier format
head(data) # check head of data
tail(data) # check tail of data
| Quarter | UGS | RNUV | NLPG | PU | PG | NUGV | NDGV | GNP_a | GNP_c | GNP_t |
|---|---|---|---|---|---|---|---|---|---|---|
| 2000-Q1 | 1 128 971 | 0.0146 | 940 000 | 469.03 | 355.69 | 4 647 500 | 281.9853 | 1 040 173 | 3 483 132 | 18 022 686 |
| 2000-Q2 | 1 199 569 | 0.0205 | 941 000 | 459.42 | 344.58 | 4 742 876 | 284.0813 | 1 760 460 | 4 525 451 | 21 797 130 |
| 2000-Q3 | 1 370 167 | 0.0207 | 943 500 | 439.98 | 327.21 | 4 840 931 | 286.7169 | 6 974 808 | 5 915 204 | 30 050 207 |
| 2000-Q4 | 1 127 548 | 0.0163 | 948 000 | 402.08 | 300.67 | 4 919 685 | 288.3137 | 3 267 125 | 4 929 778 | 24 480 153 |
| 2001-Q1 | 1 033 918 | 0.0071 | 950 000 | 411.58 | 305.75 | 4 954 754 | 287.6237 | 1 004 528 | 3 418 387 | 15 832 648 |
| 2001-Q2 | 1 019 754 | 0.0051 | 955 000 | 520.39 | 374.78 | 4 980 204 | 287.8814 | 1 449 357 | 4 359 831 | 20 296 918 |
| Quarter | UGS | RNUV | NLPG | PU | PG | NUGV | NDGV | GNP_a | GNP_c | GNP_t |
|---|---|---|---|---|---|---|---|---|---|---|
| 2006-Q3 | 946 783 | 0.0108 | 1 653 250 | 565.19 | 449.19 | 5 754 077 | 333.7144 | 6 800 688 | 7 124 204 | 34 992 138 |
| 2006-Q4 | 872 000 | 0.0133 | 1 696 000 | 565.19 | 449.19 | 5 825 866 | 339.2281 | 2 303 373 | 6 093 090 | 29 867 726 |
| 2007-Q1 | 0.0074 | 1 715 000 | 565.19 | 449.19 | 5 869 018 | 342.1729 | 1 132 973 | 4 857 305 | 24 413 807 | |
| 2007-Q2 | 0.0106 | 1 725 300 | 565.19 | 449.19 | 5 931 348 | 346.9407 | 1 570 703 | 5 852 404 | 27 597 857 | |
| 2007-Q3 | 0.0101 | 1 751 050 | 565.19 | 449.19 | 5 991 280 | 351.4449 | 7 140 722 | 7 480 414 | 36 741 745 | |
| 2007-Q4 | 0.0124 | 1 797 400 | 565.19 | 449.19 | 6 065 597 | 357.2902 | 2 418 541 | 6 397 745 | 31 361 112 |
data$UGS = gsub(" ", "", data$UGS)
data$NLPG = gsub(" ", "", data$NLPG)
data$NUGV = gsub(" ", "", data$NUGV)
data$GNP_a = gsub(" ", "", data$GNP_a)
data$GNP_c = gsub(" ", "", data$GNP_c)
data$GNP_t = gsub(" ", "", data$GNP_t)
# To make the figures numeric, removing the spaces in the figures needed.
data$UGS = as.numeric(data$UGS)
data$NLPG = as.numeric(data$NLPG)
data$NUGV = as.numeric(data$NUGV)
data$GNP_a = as.numeric(data$GNP_a)
data$GNP_c = as.numeric(data$GNP_c)
data$GNP_t = as.numeric(data$GNP_t)
# Make the figures of independent variables numeric
str(data) # Check the changes, I need to make all columns numeric except quarters that need to be date
Classes 'data.table' and 'data.frame': 32 obs. of 11 variables: $ Quarter: chr "2000-Q1" "2000-Q2" "2000-Q3" "2000-Q4" ... $ UGS : num 1128971 1199569 1370167 1127548 1033918 ... $ RNUV : num 0.0146 0.0205 0.0207 0.0163 0.0071 0.0051 0.0041 0.0048 0.0012 0.0032 ... $ NLPG : num 940000 941000 943500 948000 950000 ... $ PU : num 469 459 440 402 412 ... $ PG : num 356 345 327 301 306 ... $ NUGV : num 4647500 4742876 4840931 4919685 4954754 ... $ NDGV : num 282 284 287 288 288 ... $ GNP_a : num 1040173 1760460 6974808 3267125 1004528 ... $ GNP_c : num 3483132 4525451 5915204 4929778 3418387 ... $ GNP_t : num 18022686 21797130 30050207 24480153 15832648 ... - attr(*, ".internal.selfref")=<externalptr>
data$Quarter = as.Date(as.yearqtr(data$Quarter, format = "%Y-Q%q")) # Make quarters date format
data[, time:=1:.N] # add time info for time series
head(data) # check data if there is any problem
str(data)
| Quarter | UGS | RNUV | NLPG | PU | PG | NUGV | NDGV | GNP_a | GNP_c | GNP_t | time |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000-01-01 | 1128971 | 0.0146 | 940000 | 469.03 | 355.69 | 4647500 | 281.9853 | 1040173 | 3483132 | 18022686 | 1 |
| 2000-04-01 | 1199569 | 0.0205 | 941000 | 459.42 | 344.58 | 4742876 | 284.0813 | 1760460 | 4525451 | 21797130 | 2 |
| 2000-07-01 | 1370167 | 0.0207 | 943500 | 439.98 | 327.21 | 4840931 | 286.7169 | 6974808 | 5915204 | 30050207 | 3 |
| 2000-10-01 | 1127548 | 0.0163 | 948000 | 402.08 | 300.67 | 4919685 | 288.3137 | 3267125 | 4929778 | 24480153 | 4 |
| 2001-01-01 | 1033918 | 0.0071 | 950000 | 411.58 | 305.75 | 4954754 | 287.6237 | 1004528 | 3418387 | 15832648 | 5 |
| 2001-04-01 | 1019754 | 0.0051 | 955000 | 520.39 | 374.78 | 4980204 | 287.8814 | 1449357 | 4359831 | 20296918 | 6 |
Classes 'data.table' and 'data.frame': 32 obs. of 12 variables: $ Quarter: Date, format: "2000-01-01" "2000-04-01" ... $ UGS : num 1128971 1199569 1370167 1127548 1033918 ... $ RNUV : num 0.0146 0.0205 0.0207 0.0163 0.0071 0.0051 0.0041 0.0048 0.0012 0.0032 ... $ NLPG : num 940000 941000 943500 948000 950000 ... $ PU : num 469 459 440 402 412 ... $ PG : num 356 345 327 301 306 ... $ NUGV : num 4647500 4742876 4840931 4919685 4954754 ... $ NDGV : num 282 284 287 288 288 ... $ GNP_a : num 1040173 1760460 6974808 3267125 1004528 ... $ GNP_c : num 3483132 4525451 5915204 4929778 3418387 ... $ GNP_t : num 18022686 21797130 30050207 24480153 15832648 ... $ time : int 1 2 3 4 5 6 7 8 9 10 ... - attr(*, ".internal.selfref")=<externalptr>
data_ts <- xts(data$UGS, data$Quarter) # Convert data frame to time series
TSstudio::ts_plot(data_ts,
title = "Unleaded Gasoline Sales (2000 - 2006)",
Xtitle = "Time",
Ytitle = "UGS in (1000 m3)")
The values of Unleaded Gasoline Sales per period decreases over time(trend); however, for seasonal changes, the variance does not change that much. There is a seasonal behaviour as it can be seen in the plot.
As it can be seen from the plot, the time series is not stationary with respect to its mean. The mean is not stationary, however, it seems that the time series with respect to the variance is stationary. There is no significant change of the variance over the time if we look at the differences of the top and bottom values of consecutive time periods.
For plotting, I used ts_plot() function which is a part of TSstudio library. The features of the function is very functional so I preferred this function.
new_data = head(data, 28) # Take values without 2007 to acf work properly
new_ts <- xts(new_data$UGS, new_data$Quarter) # Convert data frame to time series
data_tsnew <-ts_ts(new_ts) # Converting time series into ts type, not xts type
Acf(data_tsnew, lag.max = 28) # Autocorrelation plots for lag 1 to lag 28
The clear output of these autocorrelation values is that at lag 4, there is autocorrelation, which is expected due to the very nature of the time series. The time series is based on quarters so the autocorrelation at lag 4 is not suprising.
Also, there is autocorrelation at lag 1. This means that between consecutive quarters, the former affect the latter.
setnames(data, 'time', 'trend') # adding trend variable
head(data)
| Quarter | UGS | RNUV | NLPG | PU | PG | NUGV | NDGV | GNP_a | GNP_c | GNP_t | trend |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000-01-01 | 1128971 | 0.0146 | 940000 | 469.03 | 355.69 | 4647500 | 281.9853 | 1040173 | 3483132 | 18022686 | 1 |
| 2000-04-01 | 1199569 | 0.0205 | 941000 | 459.42 | 344.58 | 4742876 | 284.0813 | 1760460 | 4525451 | 21797130 | 2 |
| 2000-07-01 | 1370167 | 0.0207 | 943500 | 439.98 | 327.21 | 4840931 | 286.7169 | 6974808 | 5915204 | 30050207 | 3 |
| 2000-10-01 | 1127548 | 0.0163 | 948000 | 402.08 | 300.67 | 4919685 | 288.3137 | 3267125 | 4929778 | 24480153 | 4 |
| 2001-01-01 | 1033918 | 0.0071 | 950000 | 411.58 | 305.75 | 4954754 | 287.6237 | 1004528 | 3418387 | 15832648 | 5 |
| 2001-04-01 | 1019754 | 0.0051 | 955000 | 520.39 | 374.78 | 4980204 | 287.8814 | 1449357 | 4359831 | 20296918 | 6 |
quarter=seq(1,4,by=1)
data=cbind(data,quarter)
data$quarter = as.factor(data$quarter)
str(data)
Classes 'data.table' and 'data.frame': 32 obs. of 13 variables: $ Quarter: Date, format: "2000-01-01" "2000-04-01" ... $ UGS : num 1128971 1199569 1370167 1127548 1033918 ... $ RNUV : num 0.0146 0.0205 0.0207 0.0163 0.0071 0.0051 0.0041 0.0048 0.0012 0.0032 ... $ NLPG : num 940000 941000 943500 948000 950000 ... $ PU : num 469 459 440 402 412 ... $ PG : num 356 345 327 301 306 ... $ NUGV : num 4647500 4742876 4840931 4919685 4954754 ... $ NDGV : num 282 284 287 288 288 ... $ GNP_a : num 1040173 1760460 6974808 3267125 1004528 ... $ GNP_c : num 3483132 4525451 5915204 4929778 3418387 ... $ GNP_t : num 18022686 21797130 30050207 24480153 15832648 ... $ trend : int 1 2 3 4 5 6 7 8 9 10 ... $ quarter: Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ... - attr(*, ".internal.selfref")=<externalptr>
head(data)
| Quarter | UGS | RNUV | NLPG | PU | PG | NUGV | NDGV | GNP_a | GNP_c | GNP_t | trend | quarter |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000-01-01 | 1128971 | 0.0146 | 940000 | 469.03 | 355.69 | 4647500 | 281.9853 | 1040173 | 3483132 | 18022686 | 1 | 1 |
| 2000-04-01 | 1199569 | 0.0205 | 941000 | 459.42 | 344.58 | 4742876 | 284.0813 | 1760460 | 4525451 | 21797130 | 2 | 2 |
| 2000-07-01 | 1370167 | 0.0207 | 943500 | 439.98 | 327.21 | 4840931 | 286.7169 | 6974808 | 5915204 | 30050207 | 3 | 3 |
| 2000-10-01 | 1127548 | 0.0163 | 948000 | 402.08 | 300.67 | 4919685 | 288.3137 | 3267125 | 4929778 | 24480153 | 4 | 4 |
| 2001-01-01 | 1033918 | 0.0071 | 950000 | 411.58 | 305.75 | 4954754 | 287.6237 | 1004528 | 3418387 | 15832648 | 5 | 1 |
| 2001-04-01 | 1019754 | 0.0051 | 955000 | 520.39 | 374.78 | 4980204 | 287.8814 | 1449357 | 4359831 | 20296918 | 6 | 2 |
data[, Xt_4 := 1:.N] # For adding a new variable
data$Xt_4[1:4] <- NA # To make the first four observations' values NA
num = 1:28
for (i in num) {
data$Xt_4[i+4] = data[,data$UGS[i]] # Make Y(t-4) available
}
tail(data,10) # Check if there is a problem
| Quarter | UGS | RNUV | NLPG | PU | PG | NUGV | NDGV | GNP_a | GNP_c | GNP_t | trend | quarter | Xt_4 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2005-07-01 | 1040946 | 0.0162 | 1542000 | 609.03 | 471.86 | 5507485 | 316.2897 | 6476845 | 6784956 | 33325846 | 23 | 3 | 1094823 |
| 2005-10-01 | 793399 | 0.0158 | 1596000 | 565.19 | 449.19 | 5594288 | 322.2079 | 2193688 | 5802943 | 28445453 | 24 | 4 | 854517 |
| 2006-01-01 | 736580 | 0.0080 | 1620000 | 565.19 | 449.19 | 5635972 | 324.9761 | 1079022 | 4626005 | 23251245 | 25 | 1 | 762559 |
| 2006-04-01 | 877614 | 0.0114 | 1629500 | 565.19 | 449.19 | 5696182 | 329.4702 | 1495908 | 5573718 | 26283673 | 26 | 2 | 890693 |
| 2006-07-01 | 946783 | 0.0108 | 1653250 | 565.19 | 449.19 | 5754077 | 333.7144 | 6800688 | 7124204 | 34992138 | 27 | 3 | 1040946 |
| 2006-10-01 | 872000 | 0.0133 | 1696000 | 565.19 | 449.19 | 5825866 | 339.2281 | 2303373 | 6093090 | 29867726 | 28 | 4 | 793399 |
| 2007-01-01 | NA | 0.0074 | 1715000 | 565.19 | 449.19 | 5869018 | 342.1729 | 1132973 | 4857305 | 24413807 | 29 | 1 | 736580 |
| 2007-04-01 | NA | 0.0106 | 1725300 | 565.19 | 449.19 | 5931348 | 346.9407 | 1570703 | 5852404 | 27597857 | 30 | 2 | 877614 |
| 2007-07-01 | NA | 0.0101 | 1751050 | 565.19 | 449.19 | 5991280 | 351.4449 | 7140722 | 7480414 | 36741745 | 31 | 3 | 946783 |
| 2007-10-01 | NA | 0.0124 | 1797400 | 565.19 | 449.19 | 6065597 | 357.2902 | 2418541 | 6397745 | 31361112 | 32 | 4 | 872000 |
lm_1 = lm(UGS~trend,data) # Base model
lm_2 = lm(UGS~trend+quarter, data) # Trend + Seasonal Variables
lm_3 = lm(UGS~trend+quarter+Xt_4, data) # with the additional X(t-4) variable
lm_4 = lm(UGS~trend+Xt_4, data) # Only X(t-4) without quarter seasonality variable
summary(lm_1)
summary(lm_2)
summary(lm_3)
summary(lm_4)
Call:
lm(formula = UGS ~ trend, data = data)
Residuals:
Min 1Q Median 3Q Max
-199945 -73550 -21904 71224 237369
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1170777 43293 27.043 < 2e-16 ***
trend -12660 2608 -4.854 4.95e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 111500 on 26 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.4754, Adjusted R-squared: 0.4552
F-statistic: 23.56 on 1 and 26 DF, p-value: 4.945e-05
Call:
lm(formula = UGS ~ trend + quarter, data = data)
Residuals:
Min 1Q Median 3Q Max
-81167 -31283 -3458 28640 94502
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1060372 23653 44.830 < 2e-16 ***
trend -13497 1147 -11.764 3.28e-11 ***
quarter2 121532 25987 4.677 0.000104 ***
quarter3 273619 26063 10.498 3.03e-10 ***
quarter4 95049 26189 3.629 0.001405 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 48570 on 23 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.9119, Adjusted R-squared: 0.8966
F-statistic: 59.53 on 4 and 23 DF, p-value: 8.446e-12
Call:
lm(formula = UGS ~ trend + quarter + Xt_4, data = data)
Residuals:
Min 1Q Median 3Q Max
-55595 -18251 -287 17100 78068
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.142e+05 2.286e+05 3.999 0.000842 ***
trend -9.436e+03 3.243e+03 -2.909 0.009353 **
quarter2 1.156e+05 3.248e+04 3.561 0.002235 **
quarter3 2.465e+05 6.091e+04 4.048 0.000755 ***
quarter4 8.896e+04 2.848e+04 3.124 0.005862 **
Xt_4 7.867e-02 2.001e-01 0.393 0.698744
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 38700 on 18 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.9235, Adjusted R-squared: 0.9022
F-statistic: 43.45 on 5 and 18 DF, p-value: 1.994e-09
Call:
lm(formula = UGS ~ trend + Xt_4, data = data)
Residuals:
Min 1Q Median 3Q Max
-66549 -42583 -989 47813 81027
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.371e+04 1.178e+05 0.710 0.485
trend 2.084e+03 1.960e+03 1.063 0.300
Xt_4 8.254e-01 9.262e-02 8.912 1.4e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 49740 on 21 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.8525, Adjusted R-squared: 0.8385
F-statistic: 60.69 on 2 and 21 DF, p-value: 1.87e-09
lm_full = lm(UGS~.-Quarter, data) # With all variables
summary(lm_full)
Call:
lm(formula = UGS ~ . - Quarter, data = data)
Residuals:
Min 1Q Median 3Q Max
-48133 -9872 119 11680 32612
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.943e+06 2.247e+06 2.199 0.0554 .
RNUV 2.566e+06 3.220e+06 0.797 0.4460
NLPG -6.959e-01 3.145e-01 -2.213 0.0542 .
PU 1.607e+03 1.682e+03 0.956 0.3641
PG -3.673e+03 2.509e+03 -1.464 0.1772
NUGV -1.583e+00 9.700e-01 -1.632 0.1371
NDGV 1.698e+04 1.057e+04 1.606 0.1428
GNP_a -9.359e-02 6.681e-02 -1.401 0.1948
GNP_c 1.273e-01 1.260e-01 1.011 0.3386
GNP_t -2.284e-02 2.562e-02 -0.891 0.3959
trend 4.361e+04 2.481e+04 1.758 0.1126
quarter2 1.114e+05 1.011e+05 1.102 0.2992
quarter3 7.557e+05 3.867e+05 1.954 0.0824 .
quarter4 1.998e+05 1.286e+05 1.553 0.1547
Xt_4 -4.198e-02 2.893e-01 -0.145 0.8878
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 32730 on 9 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.9726, Adjusted R-squared: 0.9301
F-statistic: 22.85 on 14 and 9 DF, p-value: 2.559e-05
lm_fullw1 = lm(UGS~.-Quarter-GNP_a, data) # With all variables except GNP_a
lm_fullw2 = lm(UGS~.-Quarter-GNP_a-GNP_c, data) # With all variables except GNP_a and GNP_c
lm_fullw3 = lm(UGS~.-Quarter-GNP_a-GNP_c-GNP_t, data) # With all variables except GNP_a, GNP_c and GNP_t
summary(lm_fullw1)
summary(lm_fullw2)
summary(lm_fullw3)
Call:
lm(formula = UGS ~ . - Quarter - GNP_a, data = data)
Residuals:
Min 1Q Median 3Q Max
-52878 -10525 573 11967 35729
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.526e+06 2.101e+06 1.678 0.124
RNUV 2.910e+05 2.911e+06 0.100 0.922
NLPG -4.624e-01 2.792e-01 -1.656 0.129
PU 9.924e+02 1.700e+03 0.584 0.572
PG -2.254e+03 2.403e+03 -0.938 0.370
NUGV -1.016e+00 9.230e-01 -1.101 0.297
NDGV 1.100e+04 1.013e+04 1.086 0.303
GNP_c 1.407e-01 1.315e-01 1.070 0.310
GNP_t -3.796e-02 2.433e-02 -1.561 0.150
trend 2.917e+04 2.363e+04 1.235 0.245
quarter2 8.881e+04 1.045e+05 0.850 0.415
quarter3 3.369e+05 2.569e+05 1.312 0.219
quarter4 1.506e+05 1.296e+05 1.162 0.272
Xt_4 6.341e-02 2.924e-01 0.217 0.833
Residual standard error: 34270 on 10 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.9667, Adjusted R-squared: 0.9233
F-statistic: 22.31 on 13 and 10 DF, p-value: 1.215e-05
Call:
lm(formula = UGS ~ . - Quarter - GNP_a - GNP_c, data = data)
Residuals:
Min 1Q Median 3Q Max
-55317 -8916 2097 17545 33464
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.553e+06 1.882e+06 2.420 0.03402 *
RNUV 9.202e+05 2.870e+06 0.321 0.75446
NLPG -3.906e-01 2.728e-01 -1.432 0.17995
PU -2.674e+02 1.234e+03 -0.217 0.83235
PG -4.421e+02 1.716e+03 -0.258 0.80144
NUGV -1.555e+00 7.788e-01 -1.997 0.07123 .
NDGV 1.751e+04 8.147e+03 2.150 0.05467 .
GNP_t -1.877e-02 1.654e-02 -1.135 0.28058
trend 2.864e+04 2.378e+04 1.205 0.25365
quarter2 1.823e+05 5.777e+04 3.155 0.00915 **
quarter3 5.091e+05 2.015e+05 2.526 0.02815 *
quarter4 2.305e+05 1.066e+05 2.163 0.05341 .
Xt_4 -8.673e-02 2.582e-01 -0.336 0.74331
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 34500 on 11 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.9628, Adjusted R-squared: 0.9223
F-statistic: 23.76 on 12 and 11 DF, p-value: 4.006e-06
Call:
lm(formula = UGS ~ . - Quarter - GNP_a - GNP_c - GNP_t, data = data)
Residuals:
Min 1Q Median 3Q Max
-60729 -9884 1369 17736 36417
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.344e+06 1.570e+06 2.131 0.05449 .
RNUV 1.046e+06 2.902e+06 0.361 0.72466
NLPG -3.074e-01 2.658e-01 -1.156 0.27013
PU -1.902e+02 1.247e+03 -0.153 0.88123
PG -6.600e+02 1.726e+03 -0.382 0.70880
NUGV -1.160e+00 7.050e-01 -1.645 0.12584
NDGV 1.381e+04 7.552e+03 1.828 0.09247 .
trend 1.441e+04 2.044e+04 0.705 0.49432
quarter2 1.321e+05 3.765e+04 3.510 0.00430 **
quarter3 2.965e+05 7.516e+04 3.946 0.00194 **
quarter4 1.148e+05 3.134e+04 3.663 0.00325 **
Xt_4 -5.284e-02 2.596e-01 -0.204 0.84211
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 34910 on 12 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.9585, Adjusted R-squared: 0.9205
F-statistic: 25.2 on 11 and 12 DF, p-value: 1.371e-06
lm_fullw4 = lm(UGS~.-Quarter-GNP_a-GNP_c-GNP_t-NLPG, data) # Removing NLPG variable
summary(lm_fullw4)
lm_fullw5 = lm(UGS~.-Quarter-GNP_a-GNP_c-GNP_t-NLPG-NDGV, data) # Removing NDGV variable
summary(lm_fullw5)
lm_fullw6 = lm(UGS~.-Quarter-GNP_a-GNP_c-GNP_t-NLPG-NDGV-NUGV, data) # Removing NUGV variable
summary(lm_fullw6)
Call:
lm(formula = UGS ~ . - Quarter - GNP_a - GNP_c - GNP_t - NLPG,
data = data)
Residuals:
Min 1Q Median 3Q Max
-57277 -12148 2360 16117 49332
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.819e+06 8.609e+05 2.113 0.05455 .
RNUV 1.093e+05 2.822e+06 0.039 0.96969
PU -7.360e+02 1.168e+03 -0.630 0.53967
PG 2.862e+02 1.539e+03 0.186 0.85533
NUGV -4.726e-01 3.839e-01 -1.231 0.24007
NDGV 6.538e+03 4.238e+03 1.543 0.14690
trend -7.823e+03 7.025e+03 -1.113 0.28566
quarter2 1.472e+05 3.577e+04 4.117 0.00121 **
quarter3 3.113e+05 7.502e+04 4.150 0.00114 **
quarter4 1.137e+05 3.173e+04 3.585 0.00333 **
Xt_4 -8.733e-02 2.612e-01 -0.334 0.74342
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 35360 on 13 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.9539, Adjusted R-squared: 0.9184
F-statistic: 26.89 on 10 and 13 DF, p-value: 4.655e-07
Call:
lm(formula = UGS ~ . - Quarter - GNP_a - GNP_c - GNP_t - NLPG -
NDGV, data = data)
Residuals:
Min 1Q Median 3Q Max
-62305 -19990 7019 19643 56697
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.283e+05 5.151e+05 1.414 0.17920
RNUV 6.684e+05 2.933e+06 0.228 0.82303
PU -4.555e+02 1.210e+03 -0.376 0.71220
PG -1.352e+02 1.587e+03 -0.085 0.93332
NUGV 9.428e-02 1.163e-01 0.810 0.43122
trend -1.060e+04 7.118e+03 -1.489 0.15869
quarter2 1.254e+05 3.443e+04 3.642 0.00266 **
quarter3 2.639e+05 7.172e+04 3.679 0.00248 **
quarter4 8.924e+04 2.879e+04 3.099 0.00784 **
Xt_4 6.867e-02 2.524e-01 0.272 0.78953
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 37060 on 14 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.9454, Adjusted R-squared: 0.9104
F-statistic: 26.95 on 9 and 14 DF, p-value: 2.464e-07
Call:
lm(formula = UGS ~ . - Quarter - GNP_a - GNP_c - GNP_t - NLPG -
NDGV - NUGV, data = data)
Residuals:
Min 1Q Median 3Q Max
-53449 -25050 4236 18808 69345
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.076e+06 2.820e+05 3.815 0.00169 **
RNUV 8.800e+05 2.888e+06 0.305 0.76477
PU -4.110e+02 1.195e+03 -0.344 0.73562
PG -1.609e+02 1.569e+03 -0.103 0.91964
trend -6.281e+03 4.668e+03 -1.346 0.19839
quarter2 1.158e+05 3.196e+04 3.624 0.00250 **
quarter3 2.424e+05 6.590e+04 3.679 0.00223 **
quarter4 8.380e+04 2.768e+04 3.028 0.00848 **
Xt_4 1.408e-01 2.334e-01 0.603 0.55537
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 36630 on 15 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.9429, Adjusted R-squared: 0.9124
F-statistic: 30.95 on 8 and 15 DF, p-value: 5.744e-08
lm_finaltry = lm(UGS~trend+quarter+PU, data) # Average price of a liter of unleaded gasoline in a quarter can affect
summary(lm_finaltry)
Call:
lm(formula = UGS ~ trend + quarter + PU, data = data)
Residuals:
Min 1Q Median 3Q Max
-61011 -30317 -3850 19446 96380
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1266436.9 114578.3 11.053 1.89e-10 ***
trend -11147.0 1683.9 -6.620 1.17e-06 ***
quarter2 128295.1 25019.3 5.128 3.87e-05 ***
quarter3 282585.0 25295.3 11.171 1.55e-10 ***
quarter4 92393.3 24980.3 3.699 0.00125 **
PU -474.8 258.9 -1.834 0.08018 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 46250 on 22 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.9236, Adjusted R-squared: 0.9062
F-statistic: 53.19 on 5 and 22 DF, p-value: 1.497e-11
lm_finaltry2 = lm(UGS~trend+quarter+PU+NUGV, data) # : Number of unleaded gasoline using vehicles in the traffic can affect
summary(lm_finaltry2)
Call:
lm(formula = UGS ~ trend + quarter + PU + NUGV, data = data)
Residuals:
Min 1Q Median 3Q Max
-68465 -32362 -4114 19606 102608
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.317e+05 5.677e+05 1.465 0.15772
trend -1.440e+04 4.493e+03 -3.205 0.00425 **
quarter2 1.278e+05 2.525e+04 5.059 5.21e-05 ***
quarter3 2.811e+05 2.560e+04 10.980 3.68e-10 ***
quarter4 8.806e+04 2.581e+04 3.412 0.00262 **
PU -4.986e+02 2.629e+02 -1.896 0.07176 .
NUGV 9.531e-02 1.218e-01 0.782 0.44280
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 46660 on 21 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.9258, Adjusted R-squared: 0.9046
F-statistic: 43.65 on 6 and 21 DF, p-value: 8.654e-11
lm_final = lm(UGS~trend+quarter+PU, data) # Trying several models, a wise model for me
summary(lm_final)
checkresiduals(lm_final)
Call:
lm(formula = UGS ~ trend + quarter + PU, data = data)
Residuals:
Min 1Q Median 3Q Max
-61011 -30317 -3850 19446 96380
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1266436.9 114578.3 11.053 1.89e-10 ***
trend -11147.0 1683.9 -6.620 1.17e-06 ***
quarter2 128295.1 25019.3 5.128 3.87e-05 ***
quarter3 282585.0 25295.3 11.171 1.55e-10 ***
quarter4 92393.3 24980.3 3.699 0.00125 **
PU -474.8 258.9 -1.834 0.08018 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 46250 on 22 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.9236, Adjusted R-squared: 0.9062
F-statistic: 53.19 on 5 and 22 DF, p-value: 1.497e-11
Breusch-Godfrey test for serial correlation of order up to 9 data: Residuals LM test = 8.8157, df = 9, p-value = 0.4545
predicted = predict(lm_final, data[1:32])
data = data[, predicted:=predicted]
paste("2007 Q1:", data$predicted[29])
paste("2007 Q2:", data$predicted[30])
paste("2007 Q3:", data$predicted[31])
paste("2007 Q4:", data$predicted[32])
Plot of the Predicted Values with the first 6 years:
data_withprediction <- xts(data$UGS, data$Quarter)
TSstudio::ts_plot(data_withprediction,
title = "Unleaded Gasoline Sales Predicted (2000 - 2007)",
Xtitle = "Time",
Ytitle = "UGS in (1000 m3)")
ggplot(data = data, aes(x = Quarter)) +
geom_line(aes(y = UGS, colour = "Actual")) +
geom_line(aes(y = predicted, colour = "Predicted")) +
scale_colour_manual("",
breaks = c("Actual", "Predicted"),
values = c("red", "green")) +
xlab(" ") +
labs(title="Actual vs Predicted")
Warning message: "Removed 4 row(s) containing missing values (geom_path)."
As seen from the plots, the prediction seems reasonable. Althought the model can be improved by adding some new variables, I think that I reached a decent model within the homework and given variables.